Position heaps: A simple and dynamic text indexing data structure

نویسندگان

  • Andrzej Ehrenfeucht
  • Ross M. McConnell
  • Nissa Osheim
  • Sung-Whan Woo
چکیده

We address the problem of finding the locations of all instances of a string P in a text T , where preprocessing of T is allowed in order to facilitate the queries. Previous data structures for this problem include the suffix tree, the suffix array, and the compact DAWG. We modify a data structure called a sequence tree, which was proposed by Coffman and Eve for hashing [1], and adapt it to the new problem. We can then produce a list of k occurrences of any string P in T in O(||P ||+k) time. Because of properties shared by suffixes of a text that are not shared by arbitrary hash keys, we can build the structure in O(||T ||) time, which is much faster than Coffman and Eve’s algorithm. These bounds are as good as those for the suffix tree, suffix array, and the compact DAWG. The advantages are the elementary nature of some of the algorithms for constructing and using the data structure and the asymptotic bounds we can give for updating the data structure when the text is edited.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Position Heap of a Trie

The position heap is a text indexing structure for a single text string, recently proposed by Ehrenfeucht et al. [Position heaps: A simple and dynamic text indexing data structure, Journal of Discrete Algorithms, 9(1):100-121, 2011]. In this paper we introduce the position heap for a set of strings, and propose an efficient algorithm to construct the position heap for a set of strings which is ...

متن کامل

Position Heaps for Parameterized Strings

We propose a new indexing structure for parameterized strings, called parameterized position heap. Parameterized position heap is applicable for parameterized pattern matching problem, where the pattern matches a substring of the text if there exists a bijective mapping from the symbols of the pattern to the symbols of the substring. We propose an online construction algorithm of parameterized ...

متن کامل

Heaps Simplified

The heap is a basic data structure used in a wide variety of applications, including shortest path and minimum spanning tree algorithms. In this paper we explore the design space of comparison-based, amortized-efficient heap implementations. From a consideration of dynamic single-elimination tournaments, we obtain the binomial queue, a classical heap implementation, in a simple and natural way....

متن کامل

Verifying Heaps' law using Google Books Ngram data

This article is devoted to the verification of the empirical Heaps law in European languages using Google Books Ngram corpus data. The connection between word distribution frequency and expected dependence of individual word number on text size is analysed in terms of a simple probability model of text generation. It is shown that the Heaps exponent varies significantly within characteristic ti...

متن کامل

An Indexing Algorithm for Text Retrieval

The rapid growth of world-wide information systems results in new requirements for text indexing and retrieval. In this paper we propose an algorithm for query evaluation in text retrieval systems based on well-known inverted lists augmented with additional data structure and estimate expected performance gains. In addition to improved performance, this data structure is able to support dynamic...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • J. Discrete Algorithms

دوره 9  شماره 

صفحات  -

تاریخ انتشار 2011